Discosuite - A parser test suite for German discontinuous structures
نویسندگان
چکیده
Parser evaluation traditionally relies on evaluation metrics which deliver a single aggregate score over all sentences in the parser output, such as PARSEVAL. However, for the evaluation of parser performance concerning a particular phenomenon, a test suite of sentences is needed in which this phenomenon has been identified. In recent years, the parsing of discontinuous structures has received a rising interest. Therefore, in this paper, we present a test suite for testing the performance of dependency and constituency parsers on non-projective dependencies and discontinuous constituents for German. The test suite is based on the newly released TIGER treebank version 2.2. It provides a unique possibility of benchmarking parsers on non-local syntactic relationships in German, for constituents and dependencies. We include a linguistic analysis of the phenomena that cause discontinuity in the TIGER annotation, thereby closing gaps in previous literature. The linguistic phenomena we investigate include extraposition, a placeholder/repeated element construction, topicalization, scrambling, local movement, parentheticals, and fronting of pronouns.
منابع مشابه
Discontinuous Verb Phrases in Parsing and Machine Translation of English and German
In this paper, we focus on the verb-particle (V-Prt) split construction in English and German and its difficulty for parsing and Machine Translation (MT). For German, we use an existing test suite of V-Prt split constructions, while for English, we build a new and comparable test suite from raw data. These two data sets are then used to perform an analysis of errors in dependency parsing, word-...
متن کاملParsing String Generating Hypergraph Grammars
A string generating hypergraph grammar is a hyperedge replacement grammar where the resulting language consists of string graphs i.e. hypergraphs modeling strings. With the help of these grammars, string languages like anbncn can be modeled that can not be generated by context-free grammars for strings. They are well suited to model discontinuous constituents in natural languages, i.e. constitu...
متن کاملA Testsuite for Testing Parser Performance on Complex German Grammatical Constructions
Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The diff...
متن کاملAn HPSG Parser Supporting Discontinuous Licenser Rules
We present an HPSG parser which can process discontinuous licenser rules in grammars with continuous constituents. We demonstrate how licenser rules can be used to reduce the parser’s search space by applying them to the analysis of German main clauses with a right sentence bracket and to complement extraposition. Before discussing the parser and the licenser rule technique, we would like to re...
متن کاملAn Approach for Resource Sharing in Multilingual NLP
In this paper we describe an approach for the analysis of documents in German and English with a shared pool of resources. For the analysis of German documents we use a document suite, which supports the user in tasks like information retrieval and information extraction. The core of the document suite is based on our tool XDOC. Now we want to exploit these methods for the analysis of English d...
متن کامل